Skip to content

fix(e2e): fix runtime tests for operator deployments and CI reporting#5027

Merged
openshift-merge-bot[bot] merged 11 commits into
redhat-developer:mainfrom
zdrapela:fix/runtime-tests-operator-externaldb
Jul 3, 2026
Merged

fix(e2e): fix runtime tests for operator deployments and CI reporting#5027
openshift-merge-bot[bot] merged 11 commits into
redhat-developer:mainfrom
zdrapela:fix/runtime-tests-operator-externaldb

Conversation

@zdrapela

@zdrapela zdrapela commented Jun 29, 2026

Copy link
Copy Markdown
Member

Follow-up to #4809. Fixes issues found during CI verification of the runtime tests refactor.

Changes

1. WebSocket teardown hang (about:blank in afterEach)

Move page.goto('about:blank') from per-test try/finally blocks to test.afterEach hooks. RHDH UI establishes WebSocket connections that prevent Playwright from closing the page cleanly — navigating to about:blank drops them before teardown.

2. External DB tests for operator deployments

  • Add postgres-cred to the Backstage CR's extraEnvs.secrets so the operator injects POSTGRES_* env vars automatically
  • Skip direct deployment env var patching in prepareForExternalDatabase for operator installs (operator reconciliation reverts direct patches)

3. CI reporting: pessimistic test status

Write a pessimistic default (failed=true, count=N/A) immediately after registering a test run. If Prow kills the job mid-test, STATUS_TEST_FAILED.txt and STATUS_NUMBER_OF_TEST_FAILED.txt still have entries for all registered test runs — preventing misaligned arrays that break Slack notifications.

save_status_test_failed and save_status_number_of_test_failed now regenerate the file from the in-memory array instead of appending, making them safe to call multiple times for the same deployment ID.

Jira: redhat.atlassian.net/browse/RHIDP-9140
Jira: redhat.atlassian.net/browse/RHIDP-9141

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly
/test e2e-ocp-helm-nightly

@openshift-ci openshift-ci Bot requested review from HusneShabbir and psrna June 29, 2026 14:01
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.77%. Comparing base (d8b493f) to head (01a6062).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5027      +/-   ##
==========================================
- Coverage   55.39%   54.77%   -0.62%     
==========================================
  Files         122      110      -12     
  Lines        2365     2147     -218     
  Branches      544      542       -2     
==========================================
- Hits         1310     1176     -134     
+ Misses       1049      969      -80     
+ Partials        6        2       -4     
Flag Coverage Δ
rhdh 54.77% <ø> (-0.62%) ⬇️

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8b493f...01a6062. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zdrapela zdrapela marked this pull request as draft June 29, 2026 14:03
@rhdh-qodo-merge

Copy link
Copy Markdown

PR Summary by Qodo

Fix runtime e2e tests for operator external DB and CI status reporting

🐞 Bug fix 🧪 Tests ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

AI Description

• Prevent Playwright teardown hangs by closing RHDH WebSockets via about:blank in afterEach.
• Fix external DB runtime tests for operator installs by relying on CR-managed secret env injection.
• Make CI test-status reporting resilient to job timeouts by writing pessimistic defaults and
 regenerating status files.
Diagram

graph TD
  CI["CI job (Prow)"] --> TSH["testing.sh"] --> TRT["test_run_tracker"] --> RSH["reporting.sh"] --> SF[("STATUS *.txt")]
  PW["Playwright specs"] --> BLK["afterEach: about:blank"]
  PW --> PGC["postgres-config.ts"] --> RCFG["runtime-config.ts"] --> OP["Backstage Operator"]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Centralize Playwright cleanup in a shared fixture/base test
  • ➕ Avoids repeating afterEach blocks across multiple spec files
  • ➕ Ensures consistent teardown behavior across the entire suite
  • ➖ Requires refactoring test structure/fixtures and may be more invasive
  • ➖ Harder to apply only to suites affected by WebSocket behavior
2. Write a single structured status artifact (e.g., JSON) instead of parallel text arrays
  • ➕ Eliminates array-length mismatches by construction
  • ➕ Simplifies downstream consumers and enables richer metadata
  • ➖ Bigger change to existing reporting and Slack integration expectations
  • ➖ Requires coordinated updates across multiple CI scripts/consumers
3. Patch operator Backstage CR on-the-fly for external DB env injection (instead of changing CR generator)
  • ➕ Limits the change to external DB test setup code paths
  • ➕ Keeps CR defaults minimal for non-external-DB runs
  • ➖ Adds runtime coupling to CR shape and operator reconciliation timing
  • ➖ More moving parts during tests; riskier failure modes in CI

Recommendation: The PR’s approach is the best incremental fix: suite-local afterEach cleanup solves the WebSocket teardown hang without test-body wrappers, operator/Helm branching aligns with reconciliation behavior, and regenerating status files makes pessimistic defaults safe and idempotent. Consider the structured status artifact as a future hardening step if reporting continues to evolve.

Files changed (8) +74 / -42

Bug fix (3) +43 / -10
testing.shWrite pessimistic test result immediately after registering a run +6/-0

Write pessimistic test result immediately after registering a run

• Marks tests as failed with count=N/A right after a deploy is marked successful. This ensures status artifacts remain aligned even if the job is killed or Playwright never reaches the final result write.

.ci/pipelines/lib/testing.sh

reporting.shRegenerate STATUS_* files from in-memory arrays (idempotent writes) +19/-4

Regenerate STATUS_* files from in-memory arrays (idempotent writes)

• Replaces append-only writes for test-failure status files with a helper that rewrites the full file from the associative array in sorted key order. This allows updating the same deployment ID multiple times (pessimistic default then final result) without corrupting downstream parsing.

.ci/pipelines/reporting.sh

postgres-config.tsSkip direct deployment env-var patching for operator installs +18/-6

Skip direct deployment env-var patching for operator installs

• Branches external DB preparation by install method: for operator installs, skips removal of schema-mode patched env vars and avoids patching POSTGRES_* env vars directly onto the Deployment (operator reconciliation would revert it). Keeps Helm behavior unchanged and logs the operator-specific path.

e2e-tests/playwright/utils/postgres-config.ts

Tests (4) +30 / -31
config-map.spec.tsMove about:blank navigation into test.afterEach +7/-5

Move about:blank navigation into test.afterEach

• Adds a suite-level afterEach hook that navigates to about:blank to drop WebSocket connections before teardown. Removes the per-test try/finally wrapper previously used solely for this cleanup.

e2e-tests/playwright/e2e/configuration-test/config-map.spec.ts

verify-tls-config-with-external-azure-db.spec.tsAdd suite-level about:blank cleanup for Azure external DB tests +8/-8

Add suite-level about:blank cleanup for Azure external DB tests

• Introduces an afterEach hook to navigate away from RHDH before Playwright teardown. Simplifies the connection verification test by removing the local try/finally cleanup wrapper.

e2e-tests/playwright/e2e/external-database/verify-tls-config-with-external-azure-db.spec.ts

verify-tls-config-with-external-rds.spec.tsAdd suite-level about:blank cleanup for RDS external DB tests +8/-8

Add suite-level about:blank cleanup for RDS external DB tests

• Introduces an afterEach hook to navigate away from RHDH before Playwright teardown. Simplifies the connection verification test by removing the local try/finally cleanup wrapper.

e2e-tests/playwright/e2e/external-database/verify-tls-config-with-external-rds.spec.ts

verify-schema-mode.spec.tsAdd afterEach teardown navigation and remove inline cleanup +7/-10

Add afterEach teardown navigation and remove inline cleanup

• Adds suite-level about:blank navigation in afterEach to avoid teardown hangs. Removes the per-test try/finally cleanup and leaves the test body focused on schema-mode assertions.

e2e-tests/playwright/e2e/plugin-division-mode-schema/verify-schema-mode.spec.ts

Other (1) +1 / -1
runtime-config.tsEnsure operator Backstage CR injects postgres-cred via extraEnvs.secrets +1/-1

Ensure operator Backstage CR injects postgres-cred via extraEnvs.secrets

• Extends Backstage CR generation to include the postgres-cred secret in extraEnvs.secrets. This enables operator-managed env var injection of POSTGRES_* keys required by external DB runtime tests.

e2e-tests/playwright/utils/runtime-config.ts

@rhdh-qodo-merge

rhdh-qodo-merge Bot commented Jun 29, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Informational

1. Duplicate POSTGRES_* env vars ✓ Resolved 🐞 Bug ☼ Reliability
Description
The operator runtime Backstage CR now injects postgres-cred as env vars, but schema-mode/operator
also relies on POSTGRES_* coming from the operator-managed DB secret; both secrets contain
overlapping POSTGRES_* keys. This makes the effective DB connection inputs ambiguous and can cause
schema-mode or external-DB tests to use incorrect credentials (e.g., placeholder values) depending
on which source wins in the generated Deployment env/envFrom ordering.
Code

e2e-tests/playwright/utils/runtime-config.ts[R389-392]

        extraEnvs: {
          envs,
-          secrets: [{ name: "rhdh-runtime-config" }],
+          secrets: [{ name: "rhdh-runtime-config" }, { name: "postgres-cred" }],
        },
Relevance

⭐ Low

Similar placeholder-secret/env-var conflict concerns were definitely rejected in PR #4649; team kept
placeholder/injection approach.

PR-#4649
PR-#4809

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
postgres-cred is now explicitly injected into the operator Backstage deployment, and it is always
created with placeholder POSTGRES_* values. Separately, schema-mode/operator documents that the
operator injects env vars from its managed DB secret and writes overlapping POSTGRES_* keys there,
while both schema-mode and external DB flows patch app-config to consume ${POSTGRES_*}. This
combination creates two competing sources for the same environment variables.

e2e-tests/playwright/utils/runtime-config.ts[379-393]
e2e-tests/playwright/utils/runtime-deploy.ts[67-83]
e2e-tests/playwright/e2e/plugin-division-mode-schema/schema-mode-setup.ts[106-156]
e2e-tests/playwright/e2e/plugin-division-mode-schema/schema-mode-setup.ts[219-255]
e2e-tests/playwright/utils/postgres-config.ts[291-333]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`postgres-cred` is now always injected into the operator runtime Backstage deployment via `spec.application.extraEnvs.secrets`. At the same time, schema-mode/operator setup also expects `POSTGRES_*` to be injected from the operator-managed DB secret and patches app-config to read `backend.database.connection` from `${POSTGRES_*}`. Since both secrets can define the same `POSTGRES_*` keys, the pod can end up with conflicting values.

### Issue Context
- `postgres-cred` is created with placeholder `POSTGRES_*` values and is intended to be overwritten later by external DB tests.
- Schema-mode/operator updates the operator-managed secret to include/override `POSTGRES_*` keys and relies on operator injection (no direct Deployment patching).
- Both schema-mode and external DB prep patch `app-config` to read DB connection fields from `${POSTGRES_*}`.

### How to fix (one of these approaches)
1. **Make `postgres-cred` the single canonical source of `POSTGRES_*` for operator runtime tests**:
  - In schema-mode/operator setup, write the needed `POSTGRES_*` keys into `postgres-cred` (instead of / in addition to the operator-managed secret) so that app-config placeholders always resolve from one source.
  - Avoid putting `POSTGRES_*` into the operator-managed secret if it’s only needed for Backstage env injection.

2. **Alternatively, avoid injecting `postgres-cred` except when running external DB tests**:
  - Patch the Backstage CR (not the Deployment) to add/remove `postgres-cred` in `extraEnvs.secrets` only for the external DB suite, to ensure there is never overlap during schema-mode runs.

### Fix Focus Areas
- e2e-tests/playwright/utils/runtime-config.ts[379-393]
- e2e-tests/playwright/utils/runtime-deploy.ts[67-83]
- e2e-tests/playwright/e2e/plugin-division-mode-schema/schema-mode-setup.ts[106-118]
- e2e-tests/playwright/e2e/plugin-division-mode-schema/schema-mode-setup.ts[219-255]
- e2e-tests/playwright/utils/postgres-config.ts[291-333]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@github-actions

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from c075f6e to 5317bd4 Compare June 30, 2026 12:16
@zdrapela

Copy link
Copy Markdown
Member Author

/test operator ocp

@github-actions

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from 5317bd4 to 53c5379 Compare June 30, 2026 13:18
@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@github-actions

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from 53c5379 to 57668f0 Compare July 1, 2026 05:27
@zdrapela

zdrapela commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from 57668f0 to b98a55d Compare July 1, 2026 09:04
@zdrapela

zdrapela commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from b98a55d to 02b171a Compare July 1, 2026 10:29
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela

zdrapela commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from 02b171a to aaedb2b Compare July 1, 2026 20:18
@zdrapela

zdrapela commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela

zdrapela commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/agentic_review

@rhdh-qodo-merge

Copy link
Copy Markdown

Qodo is busy working

Check back in a few minutes. Qodo's code review agents are on it.

Grey Divider

1 similar comment
@rhdh-qodo-merge

Copy link
Copy Markdown

Qodo is busy working

Check back in a few minutes. Qodo's code review agents are on it.

Grey Divider

@zdrapela

zdrapela commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm

1 similar comment
@zdrapela

zdrapela commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm

Comment thread .ci/pipelines/reporting.sh
Comment thread .ci/pipelines/lib/testing.sh Outdated
Comment thread e2e-tests/playwright/utils/postgres-config.ts Outdated
Comment thread e2e-tests/playwright/utils/postgres-config.ts Outdated
Comment thread e2e-tests/playwright/utils/runtime-config.ts
Comment thread e2e-tests/playwright/e2e/configuration-test/config-map.spec.ts Outdated
Comment thread e2e-tests/playwright/utils/helper.ts
zdrapela added 11 commits July 3, 2026 13:26
…rEach

Move the WebSocket teardown workaround (page.goto('about:blank')) from
per-test try/finally blocks into test.afterEach hooks. This is cleaner:
the cleanup is declared once per describe block and applies to all tests
that use the page fixture, without wrapping test bodies.

Assisted-by: OpenCode
For operator deployments, patching POSTGRES_* env vars directly onto the
deployment is reverted by operator reconciliation. The external DB tests
(Azure DB, RDS) configure postgres-cred secret with connection details,
but those values never reached the RHDH container as env vars.

Fix:
- Add postgres-cred to extraEnvs.secrets in the Backstage CR so the
  operator injects all its keys as env vars automatically
- Skip direct deployment env var patching in prepareForExternalDatabase
  for operator installs (same pattern as schema-mode setup)

Assisted-by: OpenCode
When Prow kills a job (2h timeout) or Playwright hangs, the
mark_test_result call at the end of testing::run_tests never executes.
This leaves STATUS_TEST_FAILED.txt and STATUS_NUMBER_OF_TEST_FAILED.txt
with fewer entries than STATUS_DEPLOYMENT_NAMESPACE.txt, creating
misaligned arrays that break downstream reporting (Slack notifications).

Fix: write a pessimistic default (failed=true, count=N/A) immediately
after registering the test run. The real result overwrites it after
Playwright completes. If the job is killed mid-test, the STATUS files
still have entries for all registered test runs.

To support this, save_status_test_failed and save_status_number_of_test_failed
now regenerate the file from the in-memory array instead of appending,
making them safe to call multiple times for the same deployment ID.

Assisted-by: OpenCode
The regenerate-from-array pattern requires all callers to run in the
same shell process because bash associative arrays are not inherited
by child processes. Add a comment documenting this constraint so it
travels with the code.

Also note the bash >= 4.3 requirement for local -n (nameref).

Assisted-by: OpenCode
Replace the two different sentinel strings ('N/A' and 'some') with a
single UNKNOWN_FAILURE_COUNT constant defined in test-run-tracker.sh.
Both meant 'failed but count unknown' — the downstream Slack consumer
now only has to handle one sentinel format.

Assisted-by: OpenCode
The secrets list in the Backstage CR merge-patch (postgres-config.ts)
was hardcoded separately from generateBackstageCR (runtime-config.ts).
If a secret were added to one but not the other, the patch would
silently strip it mid-test.

Extract the base list into a shared constant so the two stay in sync.

Assisted-by: OpenCode
Centralise the JSON merge-patch Content-Type header in a KubeClient
method, matching the existing jsonPatchDeployment and updateSecret
patterns. Simplifies the call site in addPostgresCredToBackstageCR
by removing the trailing positional undefined args.

Assisted-by: OpenCode
The suite-level afterEach requests the page fixture for every test,
including non-UI ones. This is an intentional tradeoff — the minor
overhead of an extra browser context is acceptable vs per-test
conditional logic.

Assisted-by: OpenCode
The runtime deployment uses a local ./dynamic-plugins/dist/ path for
the homepage plugin. Add a comment referencing redhat-developer#4909 so the OCI
migration sweep can find this spot.

Assisted-by: OpenCode
The try/catch only logged and rethrew — it existed to pair with the
finally block that was removed when about:blank cleanup moved to
afterEach. Playwright already reports errors with full stacks, so
the wrapper adds no value and costs an indent level.

Assisted-by: OpenCode
The verbose !== undefined && !== '' check is used instead of || because
oxlint's strict-boolean-expressions and prefer-nullish-coalescing rules
(pedantic category, set to error) reject || on string operands.

Assisted-by: OpenCode
@zdrapela zdrapela force-pushed the fix/runtime-tests-operator-externaldb branch from 0b55e3f to 01a6062 Compare July 3, 2026 11:26
@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm-nightly

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm

@gustavolira

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown

@zdrapela: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ocp-helm-nightly 01a6062 link false /test e2e-ocp-helm-nightly

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm

@openshift-merge-bot openshift-merge-bot Bot merged commit 4002fc1 into redhat-developer:main Jul 3, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants